Overview

Dataset statistics

Number of variables21
Number of observations998
Missing cells81
Missing cells (%)0.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory565.6 KiB
Average record size in memory580.3 B

Variable types

Text2
Categorical9
DateTime1
Numeric8
Boolean1

Dataset

DescriptionJHB_DPHRU_053 - Quality-corrected harmonized data
CreatorRP2 Clinical Data Quality Team
AuthorQuality-Checked Data
URLHEAT Research Projects

Variable descriptions

Age (at enrolment)Patient age at study enrollment
CD4 cell count (cells/µL)CD4+ T lymphocyte count (missing codes removed)
HIV viral load (copies/mL)HIV RNA copies per mL (missing codes removed)
BMI (kg/m²)Body Mass Index (extreme values removed)
Waist circumference (cm)Waist circumference (corrected from mm to cm)
ALT (U/L)Alanine aminotransferase (missing codes removed)
Platelet count (×10³/µL)Platelet count (missing codes removed)
Hematocrit (%)Hematocrit (zero values removed)
Lymphocyte count (×10⁹/L)Lymphocyte absolute count (corrected labeling)
Neutrophil count (×10⁹/L)Neutrophil absolute count (corrected labeling)
cd4_correction_appliedQuality flag: CD4 missing codes removed
final_comprehensive_fix_appliedQuality flag: Comprehensive corrections applied
waist_circ_unit_correction_appliedQuality flag: Waist circ unit corrected

Alerts

study_source has constant value "JHB_DPHRU_053"Constant
latitude has constant value "-26.2041"Constant
longitude has constant value "28.0473"Constant
province has constant value "Gauteng"Constant
city has constant value "Johannesburg"Constant
jhb_subregion has constant value "Central_JHB"Constant
cd4_correction_applied has constant value "0.0"Constant
final_comprehensive_fix_applied has constant value "1.0"Constant
waist_circ_unit_correction_applied has constant value "False"Constant
BMI (kg/m²) is highly overall correlated with Sex and 1 other fieldsHigh correlation
Sex is highly overall correlated with BMI (kg/m²) and 1 other fieldsHigh correlation
diastolic_bp_mmHg is highly overall correlated with systolic_bp_mmHgHigh correlation
height_m is highly overall correlated with SexHigh correlation
systolic_bp_mmHg is highly overall correlated with diastolic_bp_mmHgHigh correlation
weight_kg is highly overall correlated with BMI (kg/m²)High correlation
total_cholesterol_mg_dL has 26 (2.6%) missing valuesMissing
Triglycerides (mg/dL) has 26 (2.6%) missing valuesMissing
anonymous_patient_id has unique valuesUnique
Patient ID has unique valuesUnique

Reproduction

Analysis started2025-11-24 21:49:13.964011
Analysis finished2025-11-24 21:49:16.702402
Duration2.74 seconds
Software versionydata-profiling vv4.18.0
Download configurationconfig.json

Variables

Distinct998
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size72.1 KiB
2025-11-24T23:49:16.743088image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters16966
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique998 ?
Unique (%)100.0%

Sample

1st rowHEAT_EFCE0743072E
2nd rowHEAT_F3BA2B285DB1
3rd rowHEAT_2B8BDBC0C1EE
4th rowHEAT_E3EC25AD8189
5th rowHEAT_17FEBF78F855
ValueCountFrequency (%)
heat_efce0743072e1
 
0.1%
heat_a402e1bca9081
 
0.1%
heat_2ec5321500fb1
 
0.1%
heat_b216e9aa2d151
 
0.1%
heat_2b8bdbc0c1ee1
 
0.1%
heat_e3ec25ad81891
 
0.1%
heat_17febf78f8551
 
0.1%
heat_5fe7a2fc6a9c1
 
0.1%
heat_0058eddd14fa1
 
0.1%
heat_a686658cd4f51
 
0.1%
Other values (988)988
99.0%
2025-11-24T23:49:16.849709image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E1796
 
10.6%
A1749
 
10.3%
H998
 
5.9%
T998
 
5.9%
_998
 
5.9%
8785
 
4.6%
4783
 
4.6%
3777
 
4.6%
2766
 
4.5%
0765
 
4.5%
Other values (9)6551
38.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8454
49.8%
Decimal Number7514
44.3%
Connector Punctuation998
 
5.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8785
10.4%
4783
10.4%
3777
10.3%
2766
10.2%
0765
10.2%
9765
10.2%
5739
9.8%
6723
9.6%
1721
9.6%
7690
9.2%
Uppercase Letter
ValueCountFrequency (%)
E1796
21.2%
A1749
20.7%
H998
11.8%
T998
11.8%
D746
8.8%
C735
8.7%
F732
8.7%
B700
 
8.3%
Connector Punctuation
ValueCountFrequency (%)
_998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common8512
50.2%
Latin8454
49.8%

Most frequent character per script

Common
ValueCountFrequency (%)
_998
11.7%
8785
9.2%
4783
9.2%
3777
9.1%
2766
9.0%
0765
9.0%
9765
9.0%
5739
8.7%
6723
8.5%
1721
8.5%
Latin
ValueCountFrequency (%)
E1796
21.2%
A1749
20.7%
H998
11.8%
T998
11.8%
D746
8.8%
C735
8.7%
F732
8.7%
B700
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII16966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E1796
 
10.6%
A1749
 
10.3%
H998
 
5.9%
T998
 
5.9%
_998
 
5.9%
8785
 
4.6%
4783
 
4.6%
3777
 
4.6%
2766
 
4.5%
0765
 
4.5%
Other values (9)6551
38.6%

Patient ID
Text

Unique 

Distinct998
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size63.7 KiB
2025-11-24T23:49:16.929396image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length8.3396794
Min length7

Characters and Unicode

Total characters8323
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique998 ?
Unique (%)100.0%

Sample

1st rowGSK1001
2nd rowGSK1003
3rd rowGSK1004
4th rowGSK1006
5th rowGSK1007
ValueCountFrequency (%)
gsk10011
 
0.1%
gsk10191
 
0.1%
gsk10451
 
0.1%
gsk10421
 
0.1%
gsk10041
 
0.1%
gsk10061
 
0.1%
gsk10071
 
0.1%
gsk10081
 
0.1%
gsk10101
 
0.1%
gsk10111
 
0.1%
Other values (988)988
99.0%
2025-11-24T23:49:17.053369image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
11122
13.5%
G998
12.0%
S998
12.0%
K998
12.0%
0592
7.1%
3525
6.3%
2505
6.1%
4498
6.0%
5476
 
5.7%
6406
 
4.9%
Other values (3)1205
14.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5329
64.0%
Uppercase Letter2994
36.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11122
21.1%
0592
11.1%
3525
9.9%
2505
9.5%
4498
9.3%
5476
8.9%
6406
 
7.6%
9405
 
7.6%
7404
 
7.6%
8396
 
7.4%
Uppercase Letter
ValueCountFrequency (%)
G998
33.3%
S998
33.3%
K998
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common5329
64.0%
Latin2994
36.0%

Most frequent character per script

Common
ValueCountFrequency (%)
11122
21.1%
0592
11.1%
3525
9.9%
2505
9.5%
4498
9.3%
5476
8.9%
6406
 
7.6%
9405
 
7.6%
7404
 
7.6%
8396
 
7.4%
Latin
ValueCountFrequency (%)
G998
33.3%
S998
33.3%
K998
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8323
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11122
13.5%
G998
12.0%
S998
12.0%
K998
12.0%
0592
7.1%
3525
6.3%
2505
6.1%
4498
6.0%
5476
 
5.7%
6406
 
4.9%
Other values (3)1205
14.5%

study_source
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size68.2 KiB
JHB_DPHRU_053
998 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters12974
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJHB_DPHRU_053
2nd rowJHB_DPHRU_053
3rd rowJHB_DPHRU_053
4th rowJHB_DPHRU_053
5th rowJHB_DPHRU_053

Common Values

ValueCountFrequency (%)
JHB_DPHRU_053998
100.0%

Length

2025-11-24T23:49:17.103390image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.133887image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
jhb_dphru_053998
100.0%

Most occurring characters

ValueCountFrequency (%)
H1996
15.4%
_1996
15.4%
J998
7.7%
B998
7.7%
D998
7.7%
P998
7.7%
R998
7.7%
U998
7.7%
0998
7.7%
5998
7.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter7984
61.5%
Decimal Number2994
 
23.1%
Connector Punctuation1996
 
15.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H1996
25.0%
J998
12.5%
B998
12.5%
D998
12.5%
P998
12.5%
R998
12.5%
U998
12.5%
Decimal Number
ValueCountFrequency (%)
0998
33.3%
5998
33.3%
3998
33.3%
Connector Punctuation
ValueCountFrequency (%)
_1996
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7984
61.5%
Common4990
38.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
H1996
25.0%
J998
12.5%
B998
12.5%
D998
12.5%
P998
12.5%
R998
12.5%
U998
12.5%
Common
ValueCountFrequency (%)
_1996
40.0%
0998
20.0%
5998
20.0%
3998
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12974
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H1996
15.4%
_1996
15.4%
J998
7.7%
B998
7.7%
D998
7.7%
P998
7.7%
R998
7.7%
U998
7.7%
0998
7.7%
5998
7.7%
Distinct301
Distinct (%)30.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Minimum2017-01-23 00:00:00
Maximum2018-07-24 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-24T23:49:17.168878image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:17.215107image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Age (at enrolment)
Real number (ℝ)

Patient age at study enrollment

Distinct29
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.599198
Minimum41
Maximum71
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:17.257323image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum41
5-th percentile44
Q149
median53
Q359
95-th percentile63
Maximum71
Range30
Interquartile range (IQR)10

Descriptive statistics

Standard deviation5.9694709
Coefficient of variation (CV)0.11137239
Kurtosis-0.97067221
Mean53.599198
Median Absolute Deviation (MAD)5
Skewness0.064769631
Sum53492
Variance35.634583
MonotonicityNot monotonic
2025-11-24T23:49:17.298236image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
5363
 
6.3%
5561
 
6.1%
4857
 
5.7%
6256
 
5.6%
4955
 
5.5%
5052
 
5.2%
5252
 
5.2%
4750
 
5.0%
5446
 
4.6%
6046
 
4.6%
Other values (19)460
46.1%
ValueCountFrequency (%)
414
 
0.4%
424
 
0.4%
4318
 
1.8%
4430
3.0%
4539
3.9%
4639
3.9%
4750
5.0%
4857
5.7%
4955
5.5%
5052
5.2%
ValueCountFrequency (%)
711
 
0.1%
681
 
0.1%
671
 
0.1%
663
 
0.3%
658
 
0.8%
6416
 
1.6%
6336
3.6%
6256
5.6%
6140
4.0%
6046
4.6%

Sex
Categorical

High correlation 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size60.4 KiB
Male
501 
Female
497 

Length

Max length6
Median length4
Mean length4.995992
Min length4

Characters and Unicode

Total characters4986
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male501
50.2%
Female497
49.8%

Length

2025-11-24T23:49:17.422910image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.457167image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
male501
50.2%
female497
49.8%

Most occurring characters

ValueCountFrequency (%)
e1495
30.0%
a998
20.0%
l998
20.0%
M501
 
10.0%
F497
 
10.0%
m497
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3988
80.0%
Uppercase Letter998
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1495
37.5%
a998
25.0%
l998
25.0%
m497
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
M501
50.2%
F497
49.8%

Most occurring scripts

ValueCountFrequency (%)
Latin4986
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1495
30.0%
a998
20.0%
l998
20.0%
M501
 
10.0%
F497
 
10.0%
m497
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1495
30.0%
a998
20.0%
l998
20.0%
M501
 
10.0%
F497
 
10.0%
m497
 
10.0%

latitude
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size63.3 KiB
-26.2041
998 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters7984
Distinct characters7
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-26.2041
2nd row-26.2041
3rd row-26.2041
4th row-26.2041
5th row-26.2041

Common Values

ValueCountFrequency (%)
-26.2041998
100.0%

Length

2025-11-24T23:49:17.494310image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.526503image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
26.2041998
100.0%

Most occurring characters

ValueCountFrequency (%)
21996
25.0%
-998
12.5%
6998
12.5%
.998
12.5%
0998
12.5%
4998
12.5%
1998
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5988
75.0%
Dash Punctuation998
 
12.5%
Other Punctuation998
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21996
33.3%
6998
16.7%
0998
16.7%
4998
16.7%
1998
16.7%
Dash Punctuation
ValueCountFrequency (%)
-998
100.0%
Other Punctuation
ValueCountFrequency (%)
.998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common7984
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21996
25.0%
-998
12.5%
6998
12.5%
.998
12.5%
0998
12.5%
4998
12.5%
1998
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII7984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21996
25.0%
-998
12.5%
6998
12.5%
.998
12.5%
0998
12.5%
4998
12.5%
1998
12.5%

longitude
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size62.4 KiB
28.0473
998 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters6986
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row28.0473
2nd row28.0473
3rd row28.0473
4th row28.0473
5th row28.0473

Common Values

ValueCountFrequency (%)
28.0473998
100.0%

Length

2025-11-24T23:49:17.559201image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.590680image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
28.0473998
100.0%

Most occurring characters

ValueCountFrequency (%)
2998
14.3%
8998
14.3%
.998
14.3%
0998
14.3%
4998
14.3%
7998
14.3%
3998
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5988
85.7%
Other Punctuation998
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2998
16.7%
8998
16.7%
0998
16.7%
4998
16.7%
7998
16.7%
3998
16.7%
Other Punctuation
ValueCountFrequency (%)
.998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6986
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2998
14.3%
8998
14.3%
.998
14.3%
0998
14.3%
4998
14.3%
7998
14.3%
3998
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII6986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2998
14.3%
8998
14.3%
.998
14.3%
0998
14.3%
4998
14.3%
7998
14.3%
3998
14.3%

province
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size62.4 KiB
Gauteng
998 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters6986
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGauteng
2nd rowGauteng
3rd rowGauteng
4th rowGauteng
5th rowGauteng

Common Values

ValueCountFrequency (%)
Gauteng998
100.0%

Length

2025-11-24T23:49:17.623630image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.654590image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
gauteng998
100.0%

Most occurring characters

ValueCountFrequency (%)
G998
14.3%
a998
14.3%
u998
14.3%
t998
14.3%
e998
14.3%
n998
14.3%
g998
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5988
85.7%
Uppercase Letter998
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a998
16.7%
u998
16.7%
t998
16.7%
e998
16.7%
n998
16.7%
g998
16.7%
Uppercase Letter
ValueCountFrequency (%)
G998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6986
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G998
14.3%
a998
14.3%
u998
14.3%
t998
14.3%
e998
14.3%
n998
14.3%
g998
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII6986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G998
14.3%
a998
14.3%
u998
14.3%
t998
14.3%
e998
14.3%
n998
14.3%
g998
14.3%

city
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size67.2 KiB
Johannesburg
998 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters11976
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJohannesburg
2nd rowJohannesburg
3rd rowJohannesburg
4th rowJohannesburg
5th rowJohannesburg

Common Values

ValueCountFrequency (%)
Johannesburg998
100.0%

Length

2025-11-24T23:49:17.685748image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.714315image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
johannesburg998
100.0%

Most occurring characters

ValueCountFrequency (%)
n1996
16.7%
J998
8.3%
o998
8.3%
h998
8.3%
a998
8.3%
e998
8.3%
s998
8.3%
b998
8.3%
u998
8.3%
r998
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10978
91.7%
Uppercase Letter998
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n1996
18.2%
o998
9.1%
h998
9.1%
a998
9.1%
e998
9.1%
s998
9.1%
b998
9.1%
u998
9.1%
r998
9.1%
g998
9.1%
Uppercase Letter
ValueCountFrequency (%)
J998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11976
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n1996
16.7%
J998
8.3%
o998
8.3%
h998
8.3%
a998
8.3%
e998
8.3%
s998
8.3%
b998
8.3%
u998
8.3%
r998
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII11976
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n1996
16.7%
J998
8.3%
o998
8.3%
h998
8.3%
a998
8.3%
e998
8.3%
s998
8.3%
b998
8.3%
u998
8.3%
r998
8.3%

jhb_subregion
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size66.3 KiB
Central_JHB
998 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters10978
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCentral_JHB
2nd rowCentral_JHB
3rd rowCentral_JHB
4th rowCentral_JHB
5th rowCentral_JHB

Common Values

ValueCountFrequency (%)
Central_JHB998
100.0%

Length

2025-11-24T23:49:17.747213image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:17.777435image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
central_jhb998
100.0%

Most occurring characters

ValueCountFrequency (%)
C998
9.1%
e998
9.1%
n998
9.1%
t998
9.1%
r998
9.1%
a998
9.1%
l998
9.1%
_998
9.1%
J998
9.1%
H998
9.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5988
54.5%
Uppercase Letter3992
36.4%
Connector Punctuation998
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e998
16.7%
n998
16.7%
t998
16.7%
r998
16.7%
a998
16.7%
l998
16.7%
Uppercase Letter
ValueCountFrequency (%)
C998
25.0%
J998
25.0%
H998
25.0%
B998
25.0%
Connector Punctuation
ValueCountFrequency (%)
_998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9980
90.9%
Common998
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C998
10.0%
e998
10.0%
n998
10.0%
t998
10.0%
r998
10.0%
a998
10.0%
l998
10.0%
J998
10.0%
H998
10.0%
B998
10.0%
Common
ValueCountFrequency (%)
_998
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII10978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C998
9.1%
e998
9.1%
n998
9.1%
t998
9.1%
r998
9.1%
a998
9.1%
l998
9.1%
_998
9.1%
J998
9.1%
H998
9.1%

BMI (kg/m²)
Real number (ℝ)

High correlation 

Body Mass Index (extreme values removed)

Distinct816
Distinct (%)82.2%
Missing5
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean29.560846
Minimum15.24
Maximum65.89
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:17.813213image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum15.24
5-th percentile18.286
Q123.53
median29.07
Q334.52
95-th percentile42.696
Maximum65.89
Range50.65
Interquartile range (IQR)10.99

Descriptive statistics

Standard deviation7.7932914
Coefficient of variation (CV)0.2636356
Kurtosis0.67413325
Mean29.560846
Median Absolute Deviation (MAD)5.49
Skewness0.63362711
Sum29353.92
Variance60.735391
MonotonicityNot monotonic
2025-11-24T23:49:17.858556image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32.473
 
0.3%
34.763
 
0.3%
31.263
 
0.3%
32.573
 
0.3%
30.943
 
0.3%
26.963
 
0.3%
28.553
 
0.3%
27.853
 
0.3%
30.53
 
0.3%
37.23
 
0.3%
Other values (806)963
96.5%
(Missing)5
 
0.5%
ValueCountFrequency (%)
15.241
0.1%
15.31
0.1%
15.391
0.1%
15.461
0.1%
15.691
0.1%
15.851
0.1%
15.931
0.1%
16.141
0.1%
16.211
0.1%
16.341
0.1%
ValueCountFrequency (%)
65.891
0.1%
62.841
0.1%
58.621
0.1%
56.421
0.1%
56.091
0.1%
53.911
0.1%
53.851
0.1%
53.131
0.1%
52.891
0.1%
51.371
0.1%

weight_kg
Real number (ℝ)

High correlation 

Distinct529
Distinct (%)53.3%
Missing5
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean79.47718
Minimum37
Maximum168.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:17.903140image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum37
5-th percentile51.9
Q165.2
median78.1
Q390.9
95-th percentile113
Maximum168.8
Range131.8
Interquartile range (IQR)25.7

Descriptive statistics

Standard deviation19.199436
Coefficient of variation (CV)0.24157168
Kurtosis0.76277468
Mean79.47718
Median Absolute Deviation (MAD)12.9
Skewness0.63115531
Sum78920.84
Variance368.61833
MonotonicityNot monotonic
2025-11-24T23:49:17.953372image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
78.87
 
0.7%
77.26
 
0.6%
74.26
 
0.6%
66.66
 
0.6%
63.95
 
0.5%
85.35
 
0.5%
90.65
 
0.5%
80.75
 
0.5%
745
 
0.5%
60.55
 
0.5%
Other values (519)938
94.0%
ValueCountFrequency (%)
371
0.1%
38.31
0.1%
39.71
0.1%
40.71
0.1%
40.91
0.1%
42.21
0.1%
42.41
0.1%
42.91
0.1%
43.51
0.1%
43.81
0.1%
ValueCountFrequency (%)
168.81
0.1%
162.21
0.1%
153.91
0.1%
144.51
0.1%
1431
0.1%
136.81
0.1%
134.81
0.1%
134.71
0.1%
133.61
0.1%
133.41
0.1%

height_m
Real number (ℝ)

High correlation 

Distinct50
Distinct (%)5.0%
Missing5
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean1.646858
Minimum1.39
Maximum1.92
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:18.001173image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1.39
5-th percentile1.5
Q11.58
median1.64
Q31.71
95-th percentile1.79
Maximum1.92
Range0.53
Interquartile range (IQR)0.13

Descriptive statistics

Standard deviation0.090536021
Coefficient of variation (CV)0.054975001
Kurtosis-0.40941396
Mean1.646858
Median Absolute Deviation (MAD)0.07
Skewness0.11788112
Sum1635.33
Variance0.0081967711
MonotonicityNot monotonic
2025-11-24T23:49:18.049853image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.5853
 
5.3%
1.6342
 
4.2%
1.7442
 
4.2%
1.6239
 
3.9%
1.738
 
3.8%
1.5938
 
3.8%
1.6938
 
3.8%
1.6438
 
3.8%
1.6737
 
3.7%
1.5737
 
3.7%
Other values (40)591
59.2%
ValueCountFrequency (%)
1.391
 
0.1%
1.443
 
0.3%
1.456
 
0.6%
1.469
0.9%
1.475
 
0.5%
1.489
0.9%
1.497
0.7%
1.511
1.1%
1.516
 
0.6%
1.5216
1.6%
ValueCountFrequency (%)
1.921
 
0.1%
1.912
 
0.2%
1.91
 
0.1%
1.891
 
0.1%
1.881
 
0.1%
1.873
 
0.3%
1.861
 
0.1%
1.853
 
0.3%
1.842
 
0.2%
1.8310
1.0%

systolic_bp_mmHg
Real number (ℝ)

High correlation 

Distinct196
Distinct (%)19.8%
Missing7
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean132.32341
Minimum81
Maximum258.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:18.094775image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum81
5-th percentile103
Q1117.75
median130
Q3144
95-th percentile169.75
Maximum258.5
Range177.5
Interquartile range (IQR)26.25

Descriptive statistics

Standard deviation21.355436
Coefficient of variation (CV)0.16138819
Kurtosis2.2537289
Mean132.32341
Median Absolute Deviation (MAD)13
Skewness0.92951207
Sum131132.5
Variance456.05464
MonotonicityNot monotonic
2025-11-24T23:49:18.141920image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12316
 
1.6%
124.515
 
1.5%
117.515
 
1.5%
136.515
 
1.5%
11915
 
1.5%
118.514
 
1.4%
126.514
 
1.4%
12414
 
1.4%
142.513
 
1.3%
134.513
 
1.3%
Other values (186)847
84.9%
ValueCountFrequency (%)
811
0.1%
851
0.1%
86.51
0.1%
871
0.1%
87.51
0.1%
881
0.1%
88.51
0.1%
891
0.1%
89.51
0.1%
902
0.2%
ValueCountFrequency (%)
258.51
0.1%
2391
0.1%
2151
0.1%
2131
0.1%
2111
0.1%
208.51
0.1%
207.51
0.1%
206.51
0.1%
193.51
0.1%
1921
0.1%

diastolic_bp_mmHg
Real number (ℝ)

High correlation 

Distinct133
Distinct (%)13.4%
Missing7
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean88.397074
Minimum48.5
Maximum150
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:18.190125image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum48.5
5-th percentile70
Q180
median87.5
Q396
95-th percentile109.5
Maximum150
Range101.5
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.449453
Coefficient of variation (CV)0.14083558
Kurtosis1.1474371
Mean88.397074
Median Absolute Deviation (MAD)8
Skewness0.55858342
Sum87601.5
Variance154.98889
MonotonicityNot monotonic
2025-11-24T23:49:18.234540image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
79.523
 
2.3%
8623
 
2.3%
87.522
 
2.2%
9622
 
2.2%
83.521
 
2.1%
8421
 
2.1%
8320
 
2.0%
90.520
 
2.0%
9020
 
2.0%
86.520
 
2.0%
Other values (123)779
78.1%
ValueCountFrequency (%)
48.51
0.1%
561
0.1%
571
0.1%
58.51
0.1%
601
0.1%
60.51
0.1%
612
0.2%
61.51
0.1%
62.51
0.1%
632
0.2%
ValueCountFrequency (%)
1501
0.1%
141.51
0.1%
133.51
0.1%
131.51
0.1%
1311
0.1%
1301
0.1%
129.51
0.1%
125.52
0.2%
1251
0.1%
122.52
0.2%

total_cholesterol_mg_dL
Real number (ℝ)

Missing 

Distinct369
Distinct (%)38.0%
Missing26
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean4.3946914
Minimum1.6
Maximum11.98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:18.279445image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1.6
5-th percentile2.8555
Q13.68
median4.35
Q35.02
95-th percentile6.1745
Maximum11.98
Range10.38
Interquartile range (IQR)1.34

Descriptive statistics

Standard deviation1.0298543
Coefficient of variation (CV)0.23434052
Kurtosis2.8989069
Mean4.3946914
Median Absolute Deviation (MAD)0.67
Skewness0.72446762
Sum4271.64
Variance1.0605998
MonotonicityNot monotonic
2025-11-24T23:49:18.326614image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.1111
 
1.1%
4.0410
 
1.0%
4.759
 
0.9%
3.758
 
0.8%
4.458
 
0.8%
5.427
 
0.7%
3.557
 
0.7%
4.377
 
0.7%
4.097
 
0.7%
4.087
 
0.7%
Other values (359)891
89.3%
(Missing)26
 
2.6%
ValueCountFrequency (%)
1.61
0.1%
1.81
0.1%
1.821
0.1%
21
0.1%
2.151
0.1%
2.181
0.1%
2.21
0.1%
2.281
0.1%
2.331
0.1%
2.351
0.1%
ValueCountFrequency (%)
11.981
0.1%
7.821
0.1%
7.711
0.1%
7.61
0.1%
7.431
0.1%
7.31
0.1%
7.241
0.1%
7.21
0.1%
7.091
0.1%
7.082
0.2%

Triglycerides (mg/dL)
Real number (ℝ)

Missing 

Distinct207
Distinct (%)21.3%
Missing26
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean1.0413786
Minimum0.22
Maximum10.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 KiB
2025-11-24T23:49:18.373148image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0.22
5-th percentile0.42
Q10.64
median0.85
Q31.21
95-th percentile2.1545
Maximum10.42
Range10.2
Interquartile range (IQR)0.57

Descriptive statistics

Standard deviation0.74350213
Coefficient of variation (CV)0.71395949
Kurtosis38.239011
Mean1.0413786
Median Absolute Deviation (MAD)0.255
Skewness4.6665268
Sum1012.22
Variance0.55279542
MonotonicityNot monotonic
2025-11-24T23:49:18.421061image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.8217
 
1.7%
0.8117
 
1.7%
0.7116
 
1.6%
0.6616
 
1.6%
0.6915
 
1.5%
0.8314
 
1.4%
0.5414
 
1.4%
0.6713
 
1.3%
0.6513
 
1.3%
0.7813
 
1.3%
Other values (197)824
82.6%
(Missing)26
 
2.6%
ValueCountFrequency (%)
0.221
 
0.1%
0.251
 
0.1%
0.281
 
0.1%
0.32
 
0.2%
0.322
 
0.2%
0.334
0.4%
0.344
0.4%
0.356
0.6%
0.362
 
0.2%
0.375
0.5%
ValueCountFrequency (%)
10.421
0.1%
7.241
0.1%
5.62
0.2%
5.451
0.1%
5.241
0.1%
5.131
0.1%
5.011
0.1%
4.541
0.1%
4.41
0.1%
4.061
0.1%

cd4_correction_applied
Categorical

Constant 

Quality flag: CD4 missing codes removed

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size58.5 KiB
0.0
998 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2994
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0998
100.0%

Length

2025-11-24T23:49:18.464261image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:18.495518image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0998
100.0%

Most occurring characters

ValueCountFrequency (%)
01996
66.7%
.998
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1996
66.7%
Other Punctuation998
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01996
100.0%
Other Punctuation
ValueCountFrequency (%)
.998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2994
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01996
66.7%
.998
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2994
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01996
66.7%
.998
33.3%

final_comprehensive_fix_applied
Categorical

Constant 

Quality flag: Comprehensive corrections applied

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size58.5 KiB
1.0
998 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2994
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0998
100.0%

Length

2025-11-24T23:49:18.528595image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:18.560682image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0998
100.0%

Most occurring characters

ValueCountFrequency (%)
1998
33.3%
.998
33.3%
0998
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1996
66.7%
Other Punctuation998
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1998
50.0%
0998
50.0%
Other Punctuation
ValueCountFrequency (%)
.998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2994
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1998
33.3%
.998
33.3%
0998
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2994
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1998
33.3%
.998
33.3%
0998
33.3%

waist_circ_unit_correction_applied
Boolean

Constant 

Quality flag: Waist circ unit corrected

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
False
998 
ValueCountFrequency (%)
False998
100.0%
2025-11-24T23:49:18.584381image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Interactions

2025-11-24T23:49:16.210082image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.108496image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.470532image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.737906image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.017700image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.376101image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.664083image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.938289image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.244126image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.164586image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.503293image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.775270image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.051519image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.413047image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.701032image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.972384image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.275679image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.212247image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.534997image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.808165image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.170110image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.447866image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.734678image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.003546image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.309666image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.263850image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.568418image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.842812image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.205183image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.484662image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.770274image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.039108image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.344258image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.308581image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.601698image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.877889image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.240153image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.519896image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.803620image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.072556image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.379384image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.366102image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.637463image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.916154image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.275939image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.558465image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.838637image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.110169image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.410905image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.399366image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.671320image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.948886image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.308886image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.592163image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.870512image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.142180image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.444854image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.436127image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.704624image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:14.983829image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.342870image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.628253image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:15.904153image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:16.177669image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-11-24T23:49:18.605694image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Age (at enrolment)BMI (kg/m²)SexTriglycerides (mg/dL)diastolic_bp_mmHgheight_msystolic_bp_mmHgtotal_cholesterol_mg_dLweight_kg
Age (at enrolment)1.0000.1200.1380.0450.010-0.0520.2000.0910.118
BMI (kg/m²)0.1201.0000.5070.1340.185-0.4240.1880.1110.901
Sex0.1380.5071.0000.1650.0190.7590.0530.1520.248
Triglycerides (mg/dL)0.0450.1340.1651.0000.1510.1210.1630.2700.208
diastolic_bp_mmHg0.0100.1850.0190.1511.000-0.0010.7970.0830.198
height_m-0.052-0.4240.7590.121-0.0011.000-0.002-0.108-0.021
systolic_bp_mmHg0.2000.1880.0530.1630.797-0.0021.0000.0840.199
total_cholesterol_mg_dL0.0910.1110.1520.2700.083-0.1080.0841.0000.075
weight_kg0.1180.9010.2480.2080.198-0.0210.1990.0751.000

Missing values

2025-11-24T23:49:16.498763image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-24T23:49:16.595694image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-11-24T23:49:16.663822image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

anonymous_patient_idPatient IDstudy_sourceprimary_dateAge (at enrolment)Sexlatitudelongitudeprovincecityjhb_subregionBMI (kg/m²)weight_kgheight_msystolic_bp_mmHgdiastolic_bp_mmHgtotal_cholesterol_mg_dLTriglycerides (mg/dL)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
985HEAT_EFCE0743072EGSK1001JHB_DPHRU_0532017-01-2662.0Female-26.204128.0473GautengJohannesburgCentral_JHB26.7874.81.67148.589.05.581.640.01.0False
986HEAT_F3BA2B285DB1GSK1003JHB_DPHRU_0532017-02-1154.0Female-26.204128.0473GautengJohannesburgCentral_JHB29.5288.81.73149.5100.52.911.460.01.0False
987HEAT_2B8BDBC0C1EEGSK1004JHB_DPHRU_0532017-01-2362.0Female-26.204128.0473GautengJohannesburgCentral_JHB17.7759.41.83154.093.04.040.500.01.0False
988HEAT_E3EC25AD8189GSK1006JHB_DPHRU_0532017-01-2754.0Female-26.204128.0473GautengJohannesburgCentral_JHB20.4548.61.54124.572.04.280.800.01.0False
989HEAT_17FEBF78F855GSK1007JHB_DPHRU_0532017-01-3158.0Female-26.204128.0473GautengJohannesburgCentral_JHB33.9998.21.70117.572.55.220.900.01.0False
990HEAT_5FE7A2FC6A9CGSK1008JHB_DPHRU_0532017-01-3158.0Female-26.204128.0473GautengJohannesburgCentral_JHB23.4672.81.76134.084.04.620.730.01.0False
991HEAT_0058EDDD14FAGSK1010JHB_DPHRU_0532017-02-1560.0Female-26.204128.0473GautengJohannesburgCentral_JHB21.4058.41.65139.086.02.875.240.01.0False
992HEAT_A686658CD4F5GSK1011JHB_DPHRU_0532017-02-0359.0Female-26.204128.0473GautengJohannesburgCentral_JHB20.7862.21.73104.075.03.071.310.01.0False
993HEAT_C8D4CE31D3F4GSK1013JHB_DPHRU_0532017-01-2553.0Female-26.204128.0473GautengJohannesburgCentral_JHB21.3056.21.62107.077.03.860.810.01.0False
994HEAT_B8B546F9EE15GSK1014JHB_DPHRU_0532017-03-1559.0Female-26.204128.0473GautengJohannesburgCentral_JHB26.0080.61.76139.588.55.561.300.01.0False
anonymous_patient_idPatient IDstudy_sourceprimary_dateAge (at enrolment)Sexlatitudelongitudeprovincecityjhb_subregionBMI (kg/m²)weight_kgheight_msystolic_bp_mmHgdiastolic_bp_mmHgtotal_cholesterol_mg_dLTriglycerides (mg/dL)cd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
1973HEAT_CDED5B8CE2A9GSK10133JHB_DPHRU_0532018-03-2054.0Male-26.204128.0473GautengJohannesburgCentral_JHB35.8492.91.61119.094.55.830.910.01.0False
1974HEAT_BFB34305F3B2GSK10134JHB_DPHRU_0532018-04-2460.0Male-26.204128.0473GautengJohannesburgCentral_JHB27.8674.61.64130.088.54.610.730.01.0False
1975HEAT_10E35CE1F4FBGSK10135JHB_DPHRU_0532018-03-0153.0Male-26.204128.0473GautengJohannesburgCentral_JHB35.0090.51.61NaNNaNNaNNaN0.01.0False
1976HEAT_A84860D7AD5BGSK10136JHB_DPHRU_0532018-02-2353.0Male-26.204128.0473GautengJohannesburgCentral_JHB26.7963.21.54136.5103.54.370.780.01.0False
1977HEAT_6B92951F28B3GSK10137JHB_DPHRU_0532018-02-2754.0Male-26.204128.0473GautengJohannesburgCentral_JHB39.20106.61.65117.582.53.440.810.01.0False
1978HEAT_67AD7A632733GSK10138JHB_DPHRU_0532017-07-0546.0Male-26.204128.0473GautengJohannesburgCentral_JHB26.3162.91.55154.0111.54.240.470.01.0False
1979HEAT_5EFCFB217CADGSK10139JHB_DPHRU_0532018-03-0757.0Male-26.204128.0473GautengJohannesburgCentral_JHB36.7996.01.62146.596.04.411.550.01.0False
1980HEAT_183B350F01FFGSK10142JHB_DPHRU_0532018-02-1550.0Male-26.204128.0473GautengJohannesburgCentral_JHB37.20100.91.65144.589.53.770.800.01.0False
1981HEAT_E31645991B0DGSK10143JHB_DPHRU_0532017-11-2343.0Male-26.204128.0473GautengJohannesburgCentral_JHB34.9986.11.57131.586.54.560.500.01.0False
1982HEAT_55C448A8437DGSK10144JHB_DPHRU_0532017-05-3056.0Male-26.204128.0473GautengJohannesburgCentral_JHB33.8784.71.58160.599.56.751.120.01.0False